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The linear preferential attachment hypothesis has been shown to be quite successful to explain the 
existence of networks with power-law degree distributions. It is then quite important to determine 
if this mechanism is the consequence of a general principle based on local rules. In this work 
it is claimed that an effective linear preferential attachment is the natural outcome of growing 
network models based on local rules. It is also shown that the local models offer an explanation to 
other properties like the clustering hierarchy and degree correlations recently observed in complex 
networks. These conclusions are based on both analytical and numerical results of different local 
rules, including some models already proposed in the literature. 

PACS numbers: 89.75.-k,89.75.Hc,89.75.Fb,89.20.Hh,89.65.-s,87.15.Kg 



I. INTRODUCTION 

In the last few years there has been a great interest 
in the study of networks, with particular emphasize on 
the following properties: small world effect 0,0] , power 
law degree distribution 0, and more recently degree 
correlations IE an d clustering hierarchy 0, @, 0] 
This explosion has been possible thanks to the increase 
of available network maps offering the graph represen- 
tation for a wide variety of systems with sizes rang- 
ing from hundred to billions of nodes. Examples in- 
clude technological networks such as the physical Internet 
" HfllUl 111 13 13 13 llljllithe World Wide Web 
|18l Il9|. [201 , electronic mail [2ll l'22| , and electronic cir- 
cuits [23 , biological networks such as the protein-protein 
interaction network IMEIJMpllSl, metabolic paths 
[3H3! an d food webs |3ll 1321 social networks repre- 
sented by the citation graph 133. |3 H3 > scientific col- 
laboration webs |3E [33, l38l l39l| . sexual relations ^C* 
among others. 

In particular metrics like the degree (the number of 
edges incident to a vertex), the minimum path distance 
between pairs of vertices and the clustering coefficient 
(the fraction of edges among the neighbors of a vertex) 
have attracted the attention of the physics community. 
Watts and Strogatz 0, 0] have shown that, in general, 
real networks are characterized by a small average mini- 
mum path distance and a large clustering coefficient that 
together are named as the small world effect. The name 
comes from the fact that we can reach every vertex in 
the graph crossing a small number of edges. Moreover, 
Barabasi and collaborators 

[una 

have pointed out that 
many real networks are also characterized by power law 
degree distributions, giving an appreciable probability to 
observe high degree vertices. A more exhaustive analysis 
reveals that, in addition to power laws, truncated power 
laws and exponential distributions are also observed . 

Barabasi and Albert (BA) proposed a mechanism that 
explains the origin of power law degree distributions |4l| • 
This mechanism is based on two fundamental properties 
of a wide class of real networks, their growing nature 



and the existence of a preferential attachment: new ver- 
tices added to the graph are attached preferentially to 
high degree vertices. In particular a linear preferential 
attachment, where the probability to get connected to a 
vertex is proportional to its degree, leads to power law 
degree distributions. The preferential attachment mech- 
anism can be generalized in different ways. A sub- linear 
preferential attachment leads to bounded degree distri- 
butions while a super-linear one yields graphs with a sin- 
gle hub connected to almost any other vertex 0, |45| . 
The power laws can be also truncated after the intro- 
duction of other ingredients such as aging [4^ , bounded 
capacity 0] or limited information (43. Moreover, the 
introduction of quenched and annealed 0, |H(j dis- 
order leads to logarithmic corrections and multi-fractal 
scaling, respectively. 

The BA model provides a general mechanism to obtain 
power law degree distributions in growing networks. If 
one consider other measures like the clustering coefficient 
then one may conclude that this model is still insufficient 
to describe real graphs. However, we should not focus on 
the detailed properties of the model but on its philoso- 
phy. That is, if we assume that there is a growing ten- 
dency of the network and an effective linear preferential 
attachment then we obtain a scale-free degree distribu- 
tion. Actually, this effective preferential attachment have 
been measured in different real graphs, including the In- 
ternet IM|5ll and a variety of scientific collaboration webs 
[3^. l5fl lo2| . supporting the hypothesis of a linear attach- 
ment rate. With regard to the other topological proper- 
ties, we can construct many models with different clus- 
tering coefficients, minimum path distances, and other 
metrics |53j . However, the origin of the ubiquity of the 
linear preferential attachment is not clear yet. 

The topology of real networks is also characterized b 
degree correlations |3, |jfl and clustering hierarchy |6| 
Moreover, these correlations influence the behavior of 
models defined on top of these graphs, as it has been 
recently shown in Refs. [2i 113; Lid \M& IU L>M ■ Growing 
network models with global evolution rules, like the BA 
model, exhibit degree correlations. For instance, non- 
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trivial degree correlations has been obtained in the lin- 
ear preferential attachment model |45j and in a growing 
network model without any preferential attachment . 
However, the degree correlations obtained in those global 
models are not sufficiently strong to account for the fea- 
tures observed in real graphs. New models giving a bet- 
ter re pres entation of real graphs are starting to emerge 

I60tl6lj | . In addition to the numerical simulations some 
analytical treatments have shown that power-law degree 
distributions and clustering hierarchy are obtained as an 
outcome of these models |9tl62l IMBEllIil. However, 
a general principle based on local rules is still missing. 

In this work different local mechanisms that lead to 
graphs with power-law degree distributions, degree cor- 
relations and clustering hierarchy are studied. The term 
local means that we will investigate evolution rules that 
involve a vertex and its neighbors. As it will be shown 
the preferential attachment, the inverse proportionality 
between the average clustering coefficient and the vertex 
degree, and degree correlations are common features of 
growing graph models built by local rules. The general 
principles behind these features are also determined. 

The paper is organized as follows. In the first section 
the motivation for this work is presented. It is shown 
that, in addition to power-law degree distributions, clus- 
tering hierarchy and degree correlations are common fea- 
tures of real networks. Then in the following sections 
three different models based on local rules are presented. 
In all cases both analytical and numerical evidence is 
provided. In particular, in Section II I II a walk model is 
proposed as a mechanism for searchable networks such 
as the WWW and the citation network. Then in Sec- 
tion IIVI a model for social network evolution is analyzed, 
based on the existence of potential connections between 
the neighbors of a vertex. Finally, in Section we study 
models with duplication or replication of its vertices. The 
common patterns observed on these models are summa- 
rized in the concluding Section IVII 



II. CORRELATIONS AND HIERARCHY IN 
REAL GRAPHS 

In this section we study correlations in some real 
graphs. In particular we consider five different networks 
here denoted by Router, AS, WWW, Gnutella, PIN and 
Math. In all cases the graph is obtained by represent- 
ing the "relevant" units of the system by vertices and 
their interactions or relations by edges. In some cases, 
multiple graph representations of the same system can 
be obtained. Router: is the router level graph repre- 
sentation of the Internet, where each vertex represents 
a router and each edge represents a physical connection 
among them. AS: is the autonomous system (AS) rep- 
resentation of the Internet, where each vertex represents 
an AS or service provider and each edge represents a peer 
relation among them. WWW: is the graph representa- 
tion of the WWW, where each vertex represents a web 



page and each directed edge a hyperlink from one page 
to another. Here we will consider the directed edges as 
undirected. Gnutella: is the graph representation of the 
peer-to-peer network of the same name, where each ver- 
tex represents a user and each edge a peer relation among 
them. PIN: is the graph representation of the protein in- 
teraction network, where each vertex represents a protein 
and each edge an interaction among them. Math: is the 
graph representation of the mathematical co-autorship 
network, where each vertex represents an author an each 
edge the existence of at least one common publication 
among them. 

In general, real networks are correlated and correla- 
tions may have different origins. Let us consider the ex- 
ample of the Internet. Due to installation costs, the Inter- 
net has been designed with a hierarchical structure. This 
hierarchy can be schematically divided in international 
connections, national backbones, regional networks, and 
local area networks. Vertices providing access to interna- 
tional connections or national backbones are off course on 
top level of this hierarchy, since they make possible the 
communication between regional and local area networks. 
Moreover, in this way, a small average minimum path dis- 
tance can be achieved with a small average degree. This 
hierarchical structure will introduce some correlations in 
the network topology. For instance, it is expected that 
vertices with high degrees are connected to vertices with 
low degrees. 

On the contrary, in social networks well connected peo- 
ple tend to be connected with well connected people 0. 
Let us take the example of the scientific co-authorship 
graph. A scientist writing a lot of papers have in general 
a larger probability to write a paper with another scien- 
tist who has also a lot of papers, than with one with a 
few papers. In fact, if Fi is the number of papers of scien- 
tist i and Fi <C N then the probability that two scientist 
write a paper together is roughly FiFj/N. Now, Fi is in 
general a monotonic increasing function of the scientist 
degree di (number of collaborators) and, therefore, sci- 
entists with a high degree will have a better chance to 
make a new article together, i.e. to be connected. 

To investigate these correlations it has been proposed 
to analyze the clustering coefficient and the nearest 
neighbor average connectivity as a function of the ver- 
tex degree 0, U The clustering coefficient is the average 
probability that two neighbors I and m of a vertex i are 
connected. In terms of the adjacency matrix (Jy = 1 if 
vertices i and j are connected and otherwise) the clus- 
tering coefficient is defined as the conditional probability 
that if JiiJim = 1 then Ji m = 1. Thus, it measures in 
some way the existence of three-point correlations in the 
adjacency matrix. The clustering coefficient Ci is then de- 
fined as the ratio between the number of edges e, among 
the di neighbors of a given vertex i and its maximum 
possible value, d{(di — l)/2, i.e. 

Cl di(di-iy w 
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FIG. 1: Clustering coefficient as a function of the vertex degree for some real graphs. AS and Router are the autonomous system 
[To| and router ^| level graph representations of the Internet, respectively. WWW a sub-graph of the WWW network, a data 
set collected by the Notre Dame group of Complex Networks (http://www.nd. edu/ ~ networks ). Gnutella is the Gnutella peer 
to peer network, provided by Clip2 Distributed Search Solutions. PIN if the protein-protein interaction graph of Saccharomices 
Cerevisiae as obtained from two hybrid experiments [2r3 |. Math is the co-authorship graph obtained from all relevant journals 
in the field of mathematics and published in the period 1991-1998 [39|. 



The average clustering coefficient (c) is the average of c, 
over all vertices in the graph. It provides a measure of 
how well the neighbors of a vertex are locally intercon- 
nected. In Rcfs. PIQ] it have been shown that the cluster- 
ing coefficient of many graphs representing real systems 
is orders of magnitude larger than the one expected for 
a random graph and, therefore, they are far from be- 
ing random. Further information can be extracted if one 
compute it as a function of the vertex degree @ . 

In Fig. \T\we plot (c) d vs d for different real networks. 
According to this measure, two different classes emerge. 
On the first class (Math and Route data), (c) d does not 
exhibit a strong dependency with d, except for finite size 
effects at the largest degrees. This behavior is typical of 
random graphs, where the probability that two neighbors 
of a vertex are connected by an edge is a constant, and 
equal two the probability that any two vertices selected at 
random are connected. On the contrary, there is another 



class where (c) d follows an evident decay with increas- 
ing the vertex degree d. Thus, in this case, low degree 
vertices form local sub-graphs that are well connected. 
At the same time they are connected to other parts of 
the graph by high degree vertices, having a few edges 
between the subgraphs they connect but giving a small 
average minimum path distance. Thispicture makes ev- 
ident the existence of some hierarchy 0-lJ or modularity 

a 

These observations for the clustering coefficient are 
complemented by another metric related to the corre- 
lations between vertex degrees. These correlations are 
quantified by the probability p(d'\d) that a vertex with 
degree d has an edge to a vertex with degree d' . With the 
available data a plot of this magnitude results very noisy 
and difficult to interpret. Thus in [5j it was suggested to 
measure the average degree among the nearest neighbors 
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FIG. 2: Average nearest-neighbors degree as a function of the vertex degree for the real graphs introduced in Fig. Q 



of a vertex, which is given by 

(d nn ) d = J2 d 'p(d'\d), 



(2) 



and to plot it as a function of the vertex degree d. If 
there are not degree-degree correlations then the prob- 
ability that an edge points to a vertex of degree d! is 
independent of d and proportional to d'pd' resulting, af- 
ter normalization, p(d'\d) = d'pd'/{d). Therefore, the 
plot (d nn ) d vs. d will be flat and equal to 
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In Fig. [5] we plot (d nn ) vs d for several real networks. 
Also in this case we found the emergence of two differ- 
ent classes of graphs. In one of them the average nearest 
neighbor degree exhibits a power law decay with increas- 
ing vertex degree. This is a strong evidence of the exis- 
tence of disassortative (or negative) correlations, where 
large degree vertices tend to be connected with low degree 
ones and viceversa. On the other hand, for some of the 



graphs (Math and Route data) an increasing tendency 
is observed denoting the presence of assortative (or posi- 
tive) correlations, where the edges connect vertices with 
similar degrees. The same conclusions are obtained using 
the Pearson coefficient of the degrees at either ends of an 
edge 0, E3 • Notice that the subdivision attending either 
the clustering coefficient or the average nearest-neighbor 
degree coincides. 



These observations cover a wide range of networks and 
are complemented by Refs. |E IE H M E3- However, 
their origin is not yet clear. After some years of inten- 
sive research on complex networks there is not an ex- 
planation for the ubiquity of the linear the preferential 
attachment. Different models have been proposed but a 
mechanism is still missing. The lack of a general principle 
is extended to these new metrics associated with corre- 
lations. In the following sections three different models 
that exhibit these properties are studied, emphasizing on 
the mechanism behind them. Based on their analysis 
some general conclusions will be achieved. 
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III. RANDOM WALK ON A NET 

In this section we study the evolution of a graph where 
we know about new vertices by simply exploring the 
graph, with applications to searchable networks such as 
the citation and WWW graphs. We will focus on differ- 
ent local mechanisms, where the term local means that 
we will investigate evolution rules that involve a vertex 
and its neighbors. A global approach based on effective 
attachment rates can be found in |6S| . 

There are different ways to obtain information about 
the documents (articles, web pages) in these graphs, like 
looking at directories (citation index, web crawler), com- 
mercial spots, pointed by a friend, or following the ref- 
erences (citations, hyper-links) that are contained in the 
documents that we already know. In the case of the ci- 
tation graph, we often find new articles from the citation 
list of an article that we already know and, later on, we 
can repeat the process with these new articles. Moreover, 
it is known that with a high probability people know 
about new web pages by surfing on the WWW. 

Two of the major contributions to how people find 
out about new web pages are following the h ype r-links 
of other web pages and using search engines 69]. The 
first source can be characterized modeling the WWW 
"surfers" as random walkers on the WWW graph. Let 
us assume that the walk starts from a page selected at 
random and, on each page, with probability q e it decides 
to follow one link on that page or to jump to another ran- 
dom page with probability 1 — q e . Then, the probability 
Vi that a page I will be visited is given by 



1 - ge 

N 



(4) 



where Jy is the adjacency matrix and dj U denotes the 
vertex out-degree. It is quite interesting to notice that 
this probability of being visited by a random surfer is of- 
ten used by search engines as a page rank criteria [70( , as 
it is the case of the popular Google Hence, the two 
main sources through which new pages are visited are 
characterized by Eq. Q and, therefore, the main prop- 
erties of the in-degree distribution of the WWW graph 
should be computed starting on it. However, up to my 
knowledge and except from the recursive search model 
proposed by the author in Ref. |72|, no study has been 
performed in this direction. 

In a mean-field approximation one can replace the sum 
in Eq. igj by Qd\ n , resulting 



N 



'/. <-)'/; 



(5) 



where O is the average probability that a vertex pointing 
to vertex i is visited and d\ n is the vertex in-degree. To 
compute we should take into account that the proba- 
bility that a vertex i has an in-edge coming from a vertex 
with out-degree d ou is d ou pd°™ / (d ou ) . This edge will be 



selected at random among the d ou out-edges and, there- 
fore, with probability l/d ou . Thus, 



= E 



d ou Pd o 



(d° 



-Vd°™ 



(d° 



(6) 



In general when we visit new pages we do not create 
a hyper-link to it. In a first approximation this can be 
modeled introducing a probability q v that a visited ver- 
tex (page) increases its in-degree by one (a hyper-link is 
created to it). Then, when a walk is performed (v) N ver- 
tices are visited and, therefore, q v (v) N edges are added 
in average, resulting 



ON 

~dt ~ Va ' 

— = v s q v (v) N , 



(7) 



where E is the number of edges, and v s and v a are the 
number of surfers and the number of newly added pages 
per unit time, respectively. The integration of these Eqs. 
yields 



(v)N^. 



(d ou ) = (d m ) = 
Thus, from Eqs. © and (JSJ we finally obtain 

q v v s N 



(8) 



(9) 



The probability that the in-degree of a vertex of in- 
degree d^ m > increases by one when a surfer walks on the 
graph is given by A(d^) = q v v(d^) and, therefore, 
from Eqs. I© and © it follows that 



A(d (m) ) = — 
v ; N 



q v (l - q e ) + q e —d( 



(10) 



Notice that the walk on the graph leads to an effective 
linear preferential attachment. The degree distribution 
corresponding to this attachment rate can be easily ob- 
tained using the rate equation approach [i3.l45| . Indeed, 
the number of vertices n^™ (t) with in-degree d m satisfies 
the rate equation 



dn d i- 
dt 



= v s A d ™_xn d i. 



U s A d inn d in + V a 5 d i 



(11) 



Now we should take into account that the number of 
vertices on the WWW graph grows exponentially and, 
in such a case, v a oc N . Moreover, assuming that each 
surfer has its own (or group of) web page (pages) the 
number of surfers is expected to be proportional to the 
number of web pages, i.e. v s oc N. Thus, 



— = a, 



(12) 



where a is a constant. It is worth noticing that Eq. i|12|) 
is always satisfied for networks with a constant grow rate, 
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as it may be the case of the citation graph. If this con- 
dition is satisfied then the in-degree distribution reaches 
a stationary state and we can write n d in (t) — Np d i„ , 
where p^™ is the stationary probability that a vertex has 
in-degree d m . Substituting this expression in Eq. l(TT|) 
we obtain 



Pd' 



where 



i r[ a ( 7 -i) + d m ] r[(i + o)(7-i) + i] 

1 + a r[o(7 - 1)] r[(l + a) (7 - 1) + d m + 1] 

(13) 



1 

7 = 1-1 , a = aq v {l - q e ) 

with the asymptotic behavior for large in-degree 



PS 



(d in ) 



(14) 



(15) 



Hence, the random walk model on a directed graph 
leads to a power law in-degree distribution, with an ex- 
ponent 7 > 2. Notice that the power law exponent does 
not depend on q v and, therefore, we expect that general- 
izations of the rule of creating an edge to a visited vertex 
would not change this exponent. For instance, one can 
divide the vertices in classes in such a way that the edges 
can be only created among vertices of the same class, and 
the resulting power law exponent should be the same. 
Moreover, the power law exponent does not depend on 
a. 

We can go beyond the in-degree distribution and com- 
pute the clustering coefficient as a function of the total 
degree d — d m + d ou of a vertex. For this purpose we 
consider the graph as undirected and compute the num- 
ber e, of edges among the neighbors of a vertex i. Since 
the only dynamics in this model is given by the random 
walk it results that 



dei 
dt 



— = q v (q e Qd, 



q e Vi 



(16) 



The first term in the right hand side is the probability 
that a vertex with an out-edge to i is visited and the 
second the probability that vertex i is visited and the 
walk follows one of it out-edges to visit an out-neighbor 
vertex. In all cases the visited vertex is selected with 
probability q v . Using Eqs. J5J, © and 1101 and taking 
into account that dtd\ n = A(d\ n ) we can rewrite (llbi as 



de.j 
dt 



(1 



ddf 



Of 



(17) 



where we have neglected the first term in the right hand 
side of Eq. (|10fl . Integrating this equation with the 
boundary condition e(d m — 0) = we obtain the clus- 
tering coefficient. 

2e(rf) _ 2(1 + q e ) 2(l + g e )(l-d°") 
{C)d ' d(d-l) ~ d + d(d-l) ' (ibJ 

For large d the clustering coefficient scales as 

(O^M. (19) 
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FIG. 3: In-degree distribution of the random walk model for 
different values of the probability to continue the walk q e and 
for graph size N = 10 6 . In all cases we take average over 100 
realizations. The inset shows the exponent 7 obtained from 
the fit to the power law p d i n ~ (cP n ) -7 (circles) together with 
the analytical prediction (continuous line). 



Thus, we obtain an inverse proportionality between the 
clustering coefficient and the vertex degree. 



1. Random walk model 

We now study a particular random walk model by 
means of numerical simulations and compare its proper- 
ties with the analytical results obtained above. We have 
made some simplifications in order to reduce the number 
of parameters and investigate the influence of the most 
important parameter q e . The model is defined as follows: 
Initial condition: starting with one vertex and an empty 
set of edges, iteratively perform the following rules, 

• Adding: A new vertex is created with an edge point- 
ing to one of the existing vertices, which is selected 
at random. 

• Walking: if an edge is created to a vertex in the 
network then with probability q e an edge is also 
created to one of its nearest neighbors. When no 
edge is created go to the adding rule. 

The first simplification is that there is only one "surfer" 
in the network, i.e. v s = 1. Second, each time the 
"surfer" decides not to follow one of the edges of the 
visited vertex it stops, and a new vertex starts a search 
from a vertex selected at random. In other words the 
jump to a random vertex is coupled with the addition of 
new vertices resulting v a = 1 — q e . Finally, each time 
a vertex is visited an edge is created to it, thus q v = 1. 
Hence, the in-degree distribution is given by Eq. i|13|) 
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FIG. 4: Clustering coefficient as a function of vertex degree of 
the random walk model, for different values of the probability 
to continue the walk q e and for graph size N = 10 6 . In all 
cases we take average over 100 realizations. The solid lines 
correspond with the power law decay C(d) = 2(1 + q e )/d. 



FIG. 5: Average neighbor degree as a function of vertex 
degree of the random walk model, for different values of 
the probability to continue the walk q e and for graph size 
N = 10 6 . In all cases we take average over 100 realizations. 



with 

7=1 + —, a = l. (20) 

We have made numerical simulations of this random walk 
model up to graph sizes N = 10 6 making average over 100 
realizations. In Fig. |3|we show a log-log plot of the in- 
degrec distribution for different values of q e . The power 
law decay for large in-degrees is evident. The exponent 
7 obtained from the fit to the numerical data is shown 
in the inset, together with the predicted dependency in 
Eq. (|20fl . The analytical values overestimate the power 
law exponent but the qualitative picture is the same. For 
q e — > the power law exponent is so large that the degree 
distribution cannot be distinguished from an exponential 
distribution. On the contrary, for q — ► 1 it approaches 
is minimum value 7 = 2. We attribute the quantitative 
disagreement to the mean-field approximation performed 
in the step from Eq. J3J to J5J. On the other hand, the 
behavior of the average clustering coefficient with respect 
to the vertex degree is shown in Fig. 0] In this case the 
analytical asymptotic behavior in Eq. (|19f) is in very 
good agreement with the numerical data. 

We were not able to obtain a prediction for the scaling 
of the average neighbor degree with the vertex degree. In 
this case our analysis relies on numerical simulations. In 
Fig. © we plot (d nn ) vs. d for two values of q e . For 
q e = 0.3 and for small values of q e the average neigh- 
bor degree does not exhibit a strong dependency with d 
and, therefore, the graph appears as uncorrelated. On 
the contrary, for q e — 0.5 and in general for larger values 
of q e it shows a peak around d = 10 and then decays 
with increasing the degree. This decay becomes even 
faster with increasing q e . We have not found an expla- 
nation for this qualitative change of behavior yet. It is 



worth noticing that the experimental data for the WWW 
yields a 7 w 2.1, that can be obtained with our model 
using q e > 0.5. For this value of q e the model yields neg- 
ative correlations in agreement with the real data pre- 
sented in Sec. [H] However, we should take into account 
that the above analysis includes the fluctuation proper- 
ties of the in-degree, while the statistics of the out-degree 
was not considered. The last one is irrelevant to deter- 
mine the in-degree distribution but has to be taken into 
account to determine the clustering and degree correla- 
tion properties of the undirected representation of the 
directed graph. Hence, the results obtained here for (c) d 
and (d nn } d are not conclusive. 



2. Recursive search model 

In the random walk model one follows only one edge 
of the visited vertices. However, one may consider an 
exhaustive search following all the edges recursively [T^ • 
The main idea of a recursive search is thus to be con- 
nected to one vertex of the network and any time we get 
in contact with a new vertex we follow all its edges, ex- 
ploring in this way a larger part of the network. This can 
be modeled modifying the walking rule as follows, 

• Walking: if an edge is created to a vertex in the 
network then with probability q e an edge is also 
created to each of its nearest neighbors. When no 
edge is created go to the adding rule. 

As for the previous model we have u s = 1, v a = 1 — 
q e but A(d m ) is not given by Eq. (fTTTfl . The form of 
A(d m ), and consequently the in-degree distribution, is 
determined below for two limiting cases. 
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FIG. 6: Log-log plot of the in-degree distribution of the recur- 
sive search model for different values of q e . The inset shows 
the exponent 7 obtained from the power law fit p d t n ~ (din) ' 
to the numerical data. 



q e = 0: In this case only the adding rule is per- 
formed, hence A(d m ) = 1/N independent of d m . The 
fact that A(d m ) scales as iV _1 carries as a consequence 
that nd>n(JV) = Npdin is the stationary solution of eq. 
(|lip. where p d %n is the stationary probability to find a 
vertex with in-degree d m . Substituting this expression 
in Eq. Ijlll) one obtains 



Pd*- 



= 2 -(d l "+i)^ 



(21) 



q e = 1: Also for this limiting case the in-degree distribu- 
tion can be computed exactly. Let us determine A(d m ) 
using the following fact. Any vertex i with in-degree d\ n 
has d\ n vertices with an edge to it, which will be denoted 
by Xj (j — 1, 2, . . . , df 1 ). At the same time each of these 
Xj vertices may have other vertices with an edge to it. 
The following result holds: any vertex with an edge to 
any of the vertices Xj has also an edge to i. The proof 
is straightforward, if when a vertex is added it creates 
an edge to any of the vertices Xj then with probability 
q e = 1 it creates an edge to all the nearest neighbors of 
Xj, among which vertex i is contained; end of the proof. 
Hence, the probability that when a vertex is added it cre- 
ates an edge to vertex i is just the probability (l + d\ n )/N 
that the first edge is connected to i or to any of the d\ n 
vertices with an edge to i, i.e. A(d in ) = (l + d m )/N. As 
for q e = A(d m ) scales as 1/N and, therefore, the sta- 
tionary solution is of the form n d i n (N) = Np d in . Then 
from Eq. ljTT|) it follows that 



Pd*" 



(d in + l)(d in + 2)' 



(22) 



Notice that also in this case, although it is not implicitly 
assumed, there is a preferential attachment leading to the 
power-law decay for large in-degrees p d in ~ (d m )~ 2 . 



The limiting cases q e = and q e = 1 are described 
by in-degree distributions which are qualitative different. 
For q e = the distribution is exponential with a finite 
average in-degree. On the contrary, for q e = 1, the dis- 
tribution follows a power law decay p d i„ ~ d m 7 for 
large d m , with 7 = 2. This power law decay goes up 
to the largest possible degree d m ~ N 1 ^ 1 ^ 1 ^ ~ N while 
p d i n = for d m > N. Hence, for q e — 1 and large iV the 
average in-degree scale as 



(d in )(N) = (d ou ) (N)=a + In N, 



(23) 



where a is independent of TV and clearly (d m ) diverges 
in the thermodynamic (large network sizes) limit. In a 
mean-field approximation one can neglect the existence 
of loops in the network and, in such a case, the "walking" 
rule will take place on a tree. Each vertex on the tree will 
have on average (d ou )(N) sons, which is just the average 
out-degree after N vertices have been added. Moreover, 
if a vertex is visited then each of its sons will be visited 
with probability q e . Hence, when the vertex N + 1 is 
added, its average out-degree (d ou )(N + 1) will be given 
by the average number of vertices visited during the walk, 



(d ou ){N + l) = l + q e (d ou ){N) + [q e (d ou ){N)} 2 
1 



l-q e (d^)(NY 



(24) 



If there is a stationary state then (d ou )(N + 1) = 
(d° u )(N) = (d ou ). In this case Eq. yields two solu- 
tions. One of them diverges when q e — > 0, which is not 
admissible since (d ou ) = 1 for q e = 0. The other solution 
reads 



(d ou ) = (d m ) = 



2<Ze 



(25) 



This solution is valid for q e < q c = 1/4 and, therefore, 
the average out degree does not converge to an station- 
ary value when q e > q c . In this last region the average 
out degree increases logarithmically with N, as in the 
extreme case q e = 1 (see Eq. ffity). Now, (d m ) = (d ou ) 
and both approach a stationary state for any 7 > 2 and 
diverge otherwise. We then expect that the in-degree dis- 
tribution has a power law exponent 7 > 2 for q e < q c and 
7 < 2 for q e > q c . Moreover, taking into account that the 
fastest divergence is obtained for q e = 1, where 7 = 2, 
we conclude that for q e > q c the power law exponent is 
constant and equal to 7 = 2. 

To investigate the behavior for < q e < 1 and the 
existence of a non trivial threshold q c as predicted by 
the mean-field approach, we have made numerical simu- 
lations of the recursive search model for different values 
of q e up to graph sizes N = 10 5 . For each value of q e the 
in-degree distribution was averaged over 100 runs of the 
algorithm. The resulting in-degree distribution is shown 
in Fig. ©. For q e = 0.1 the decay for large in-degrees 
is very fast, and can be fitted by a power law decay with 



9 



a very large exponent or equivalently by an exponential 
decay. On the contrary, for larger q e the exponent be- 
comes smaller and the power law behavior becomes more 
evident. Finally, for q e > q c = 0.5±0.1 the exponent be- 
comes independent of q e and equals 7 = 2, in agreement 
with the mean-field prediction. However, the numerical 
threshold is two times the value obtained from Eq. 1)25(1. 

In ordinary critical phenomena the absence of any typ- 
ical length scale takes place at the critical point, which is 
observed at a precise value of the order parameter. For 
the present model, however, the absence of a character- 
istic in-degree is not only manifested at a precise value 
of q e but in the whole interval q c < q e < 1. These fea- 
tures are very similar to those observed in some sandpile 
models 1731 1741 . the paradigm of self-or gani zed critical 
systems [75|, |76| . As in these models [73, UM > there is a 
time scale separation between the addition of new ver- 
tices and their "walk" through the network. In the ther- 
modynamic limit (N — > oo) the phase diagram of the 
model is divided in a sub-critical (0 < q e < q c ) and a 
critical region (q c < q e < 1), where the power law expo- 
nent does not depend on the control parameter. Hence, 
the results presented here suggest that for q c < q e < 1 
the present model is in a self-organized critical state. 



IV. CONNECTING NEAREST-NEIGHBORS 

In social graphs it is more probable that two vertices 
with a common neighbor get connected than two vertices 
chosen at random [52]. Clearly this property leads to a 
large average clustering coefficient since it increases the 
number of connections between the neighbors of a ver- 
tex, as it has been already observed in a model proposed 
by Davidsen, Ebel and Bornholdt (DEB) [fjj. The basic 
assumption of their model is that the evolution of social 
connections is mainly determined by the creation of new 
relations between pairs of individuals with a common 
friend. Moreover, a similar mechanism was considered 
by Holme and Kim [(5lJ and by Gin et al [3^| to intro- 
duce an appreciable clustering coefficient in preferential 
attachment models. 

The study of these models has been mainly performed 
by numerical simulations. A deeper analytical under- 
standing can be obtained by introducing the concept of 
potential edge. We will say that a pair of vertices is con- 
nected by a potential edge if 

1. they are not connected by an edge and 

2. they have at least one common neighbor. 

Notice that while this concept have been implicitly con- 
sidered in previous works its mathematical description 
will be introduced here for the first time. 

The graph dynamics will be defined by the transition 
rates between the three possible states of a pair of ver- 
tices: disconnected (s), connected by a potential edge 
(p) or by an edge (e) . Let d* be the number of potential 



edges incident to vertex i, potential degree to abbreviate. 
We can write the rate equations for the evolution of the 
number of vertices with degree d and potential degree 
d* . Instead we will use the continuum approach [8(J, [81| . 
In this case we neglect fluctuations and write mean-field 
equations for the evolution of di and d*, 



ddj 
dN 
d% 
dN 
di 



v s ^ e di + v p ^ e d* - (v e ^ s + v e ^ p )di, 
N — di - di . 



(26) 



v x —>y is the transition rate from state x to state y per 
unit of N and di is the number of remaining neighbors, 
that are not connected by a potential edge nor by an edge 
to vertex i. 

The creation (deletion) of a potential edge incident to 
a vertex is associated with the creation (deletion) of an 
edge incident to one of its neighbors. For instance, if a 
new vertex i is connected to an existing vertex j then a 
potential edge is created between i and all neighbors of 
j. Hence 



Vs- 



*edi, 



(27) 



These equalities are at the core of the connecting nearest- 
neighbors model. 

In the following we will neglect any process where an 
edge is deleted, i.e. 



0. 



(28) 



This assumption may seem too crude for some social net- 
works where it is known that social relations can be lost 
but it is realistic in many other cases. For instance, in the 
network of scientific collaborations two scientists are said 
to be connected if they have co-authored a paper. It is 
clear that this connection cannot be lost in time because 
the fact that they have written a paper together cannot 
be changed. In general, if the connection between two 
vertices is given by the occurrence of certain event (co- 
authoring a paper, being in the cast of a the same film, 
having a sexual relation) in the past history then this 
connection cannot be lost and, therefore, our approxima- 
tion holds. 

Another crucial assumption is related to the fact that 
the transition from potential edge to an edge has a higher 
probability of occurrence than the transition from discon- 
nected to an edge. In fact, the connection of two discon- 
nected vertices without a common neighbor is a process 
that models the creation of a social relation between two 
social entities chosen at random. We thus assume 



Mo 

N 2- 



(29) 



On the other hand, the creation of an edge between two 
vertices with a common neighbor, that is with a poten- 
tial edge between them, models the creation of a social 
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FIG. 7: Schematic representation of the two evolution rules 
of the connecting nearest-neighbors model. Top: with proba- 
bility u a potential edge (dashed line) becomes an edge (con- 
tinuum lines). Bottom: with probability 1 — u a new vertex 
is added to the graph (disconnected vertex in the left), then 
it is connected with an edge to a vertex selected at random 
and by potential edges to its neighbors (right). 



relation between two "friends" of a social entity. In this 
case we assume 

N ' 

Under these approximations the system of equations 
(l26l) is reduced to 



(30) 



ddi 

N dN = /i0 + Ml 4 ' 



N 



dN 



(31) 



Hence, the existence of a linear preferential attachment 
(the growth rates of di and d* are linear in themselves) in 
this class of models becomes evident with the introduc- 
tion of the concept of potential edges. Thus, a power-law 
degree distribution is expected. This system of differ- 
ential equations is linear and, therefore, can be easily 
integrated resulting that, for N ^> N, 



N 

d i (N) = d [ w 



(32) 



where N is the size of the graph when vertex i was added 
to it and 



-1 



1 + 4^ 



(33) 



Now, if the vertices are added at a constant rate then 
P(N =N) = l/N yielding 



P(di > d) 



P 



> d 



FIG. 8: Degree distribution of the connecting nearest neigh- 
bors model for different values of the addition rate u, graph 
size TV = 10 6 and average over 100 realizations. The inset 
shows the exponent 7 obtained from the fit to the power law 
Pd = ad~ J (circles) together with the analytical prediction 
(continuous line). 



JY 



dNj 
N 



e 



1 Ni 



Consequently, 



with 



Pd = 



dP(di > d) 



dd 



7 



1 

Tr 



(34) 



(35) 



(36) 



Notice that the main ingredient leading to this power law 
behavior is given by Eq. H27[). O n the contrary, if v s ^ p 
would be independent of the vertex degree an exponential 
decay would be obtained. 

We can also compute the clustering coefficient as a 
function of the vertex degree. The main contribution 
to the evolution of e^, the number of edges among the 
neighbors of vertex i, is given by the transition potential 
edge — ► edge. In fact, if the potential edge connecting 
a vertex i to another vertex j, with common neighbor fc, 
becomes an edge then vertex i gains one neighbor (vertex 
j) and a new edge among its neighbors (that connecting 
j and k). Neglecting other contributions we have 



de t 
dN 



»d* = Ui — . 

" ^ TV 



(37) 



Integrating this equation using Eq. I|32|l it results that 



2e(d) 



2^i 
d 



(38) 



Wd d(d-l) 

Thus, once again we obtain the inverse proportionality 
between (c) d and vertex degree d, in this case due to 
the conversion of potential edges between vertices with a 
common neighbor into edges. 
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FIG. 9: Clustering coefficient as a function of vertex degree 
of the connecting nearest neighbors model for different values 
of the addition rate u, graph size N = 10 6 and average over 
100 realizations. The solid line is a power law decay with 
exponent 0.6. 

3. Connecting nearest-neighbors model 

To check these results we have made numerical simu- 
lations of a variant of the DEB model. Starting with a 
single vertex and an empty set of edges iteratively per- 
form the following rules: 

• With probability 1 — tt introduce a new vertex in 
the graph, create an edge from the new vertex to a 
vertex j selected at random, (implying the creation 
of a potential edge between the new vertex and all 
the neighbors of j). 

• With probability u convert one potential edge se- 
lected at random into an edge. 

A schematic representation of these rules is shown in Fig. 
[7| Actually, in the DEB model the number of vertices is 
fixed and each time a new vertex is added one vertex 
is removed from the graph. We consider the growing 
variant because in this case it is easier to determine some 
properties analytically. For very large N we expect that 
both variants have the same qualitative behavior. 

These evolution rules fit into the equations written 
above after setting 

u 

Mo = 1, Ml = -, ■ (39) 

1 — u 

Thus, from Eqs. and iJSSJ it follows that 

2(1 - it) / / l-u\ 1 , x 

7 ( u ) = l + ±—2l-l + ] /l + 4—j , (40) 

with the limiting cases 

7(0) = oo, 7(1) = 2. (41) 




d 



FIG. 10: Average degree among the neighbors of a vertex 
with degree d of the connecting nearest neighbors model for 
different values of the addition rate u, graph size N = 10 6 
and average over 100 realizations. The solid line is a power 
law growth with exponent 0.6. 

Thus, the power law exponent 7 takes its minimum value 
when u — * 1 corresponding to a low rate of addition of 
vertices and it grows with decreasing u corresponding to 
higher rates of vertex addition. In Fig. [5] we plot the 
degree distribution as obtained from numerical simula- 
tions. For intermediate degrees it exhibits a power law 
decay p c i ~ d ' . The value of 7 obtained from the fit to 
the numerical data is shown in the inset, together with 
the analytical curve given by Eq. 1)41) |). The quantita- 
tive disagreement tell us that the mean-field Eq. i|2rj|) 
give us the right qualitative description but fluctuations 
should be considered to obtain a precise agreement with 
the numerical data. 

In Fig. |5| we plot the clustering coefficient as a func- 
tion of the vertex degree. It follows a power law de- 
cay for large degrees but with an exponent smaller than 
1. On the other hand, the average neighbor degree as 
a function of the vertex degree is shown in Fig. ^] It 
increases with increasing d, i. e. the graphs generated us- 
ing this model exhibit positive degree correlations. This 
result is in very good agreement with the observations 
made for social graphs that are also characterized by pos- 
itive degree correlations. Hence, the connecting nearest- 
neighbors mechanism generates many of the topological 
properties of social networks, including power law degree 
distributions and positive correlations. 



V. DUPLICATION DIVERGENCE 

The evolution of some real graphs is given by a repli- 
cation or partial replication of its local structure. An 
example is the genome that evolves, among other mech- 
anisms, through single gene or full genome duplications 
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[82] and mutations that lead to the differentiation of the 
duplicate genes. The evolution of the genome can be 
translated into the evolution of the protein-protein inter- 
action network where each vertex represents the protein 
expressed by a gene. After gene duplication both ex- 
pressed proteins will have the same interactions. This 
corresponds to the addition of a new vertex in the net- 
work with edges pointing to the neighbors of its ancestor. 
In addition positive and negative mutations can be mod- 
eled by the creation and lost, respectively, of the edges 
leading to the divergence of the duplicates (2^1 E3 • 
The duplication mechanism has been also considered in 
the evolution of other biological networks |84| . Moreover, 
another example is given by the WWW where new web 
pages may be created making a copy or a partial copy of 
the hyperlinks present in other web pages |85| . In this 
case the duplication represents the copying process and 
the divergence the deletion or addition of hyperlinks in 
the duplicated pages. 

In a first approximation we will assume that the pro- 
cesses of duplication and divergence are not coupled but 
take place independently one of the other. Moreover, we 
will also assume that the creation and deletion of edges 
take place at random and that they are independent of 
the degree of the vertices at the edge ends, or any other 
topological property. Under these approximations, the 
evolution of the degree of a vertex (the number of inter- 
acting partners) is given by 



dN 



u D di + vc(N - di) - v L di, 



(42) 



where vd, uc, and vl are the rates per unit of vertex 
added of duplications, edge creation and edge lost, re- 
spectively. By definition, each duplication implies the 
addition of a new vertex and, therefore, 



We will further assume that 



vc 



Mo 

N ' 



1 



vl = 



Mi 
N 



(43) 



(44) 



otherwise the stationary graph will be empty or fully con- 
nected, both being unreal. Notice that and [i\ are 
new parameters with no relation to those introduced in 
the previous section. Then, substituting Eqs. and 
(|4ljl into Eq. g2J> we obtain 



ddi 

N ~dN = /i0 + ^ ~ ^' 



(45) 



The linear dependency of the growth rate with di ev- 
idences once again the existence of an effective linear 
preferential attachment. The integration of this equation 
yields 



d t {N) = di(Ni) + 



Mo 



Mi 



nJ 



Mo 



Mi 



(46) 



where Ni and d,-(iVj) are the graph size and degree of 
vertex i when vertex i was added to the graph, and 



/3 = 1-Mi- 
Here we have implicitly assumed that 

Mi < 1, 



(47) 



(48) 



otherwise the stationary state will be an empty graph. 
From Eq. lj4U)l it follows that 



P(di >d) = P 



di(Ni 



Mo 



N 



1 — Mi/ \N> 



Mo 



1 - Mi 



> d 



(49) 



This probability should be computed taking into account 
that both Ni and di(Ni) are random variables. If the 
duplications take place at a constant rate then the prob- 
ability density of Ni is given by P(N t = N) = 1/N. 
Moreover, the probability that a vertex has degree d t (Ni) 
when it is introduced is just the probability that its an- 
cestor has this degree. If the graph is in a stationary state 
then P[di(Ni) = d] = Pd, is just the degree distribution. 
Hence 



P(di >d) = 



Jl 1 



d! 



Mo 



1 - /i] 







Mo 



1 - Mi 



> d 



For JV> 1 we finally obtain 



Pd 



with 



dPjdj > d) 
dd 



7=1 



Mo 



1 - Mi 



1 



Mi 



(50) 



(51) 



(52) 



The origin of this power law degree distribution is de- 
termined by the second term in the right hand side of 
Eq. (J25J, associated with the vertex duplications and 
subsequent edge lost. These are local mechanisms and, 
as in the models describe before, they lead to an effective 
preferential attachment manifested as a power law degree 
distribution. 

The next step is thus to investigate if the duplication- 
divergence model satisfies the inverse proportionality be- 
tween the average clustering coefficient and vertex de- 
gree. If the creation of new interactions takes place at 
random, i.e. they appear between randomly chosen ver- 
tices, then the average clustering coefficient will be neg- 
ligible for large graph sizes N. There is however one 
source of new interactions giving an appreciable contri- 
bution. In the duplication process, if the ancestor is a 
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FIG. 11: Schematic representation of the coupled duplication- 
divergence model evolution rules. Left and middle: A vertex 
({>) is being duplicated. Right: The divergence of the du- 
plicates is manifested as a coupled lost of interactions, where 
the coupling is given by the restriction that for each neighbor 
(•) at least one of the duplicates should preserve an edge to 
it. Moreover, due to the existence of self-interactions, a new 
edge can be created between the duplicates (dashed line). 



self-interacting protein then the ancestor and the dupli- 
cate may have an interaction among them |2S| . Let us 
assume that this happens with a probability q v . Thus, 
if a neighbor of a vertex i is duplicated it will gain a 
new neighbor (the copy) and with probability q v an edge 
between its neighbors (that between the copy and its an- 
cestor) and therefore 



dej 

at 



ddj 

at ' 



(53) 



where we have neglected any other process leading to 
new interactions and edge lost. The integration of this 
equation yields 



2e(d) 
d(d- 1) 



2qv 
d ' 



(54) 



Hence, under these assumptions we obtain the inverse 
proportionality behavior. The inclusion of the edge lost 
may change this result. We do not have any analytical 
proof but since this process contributes to the lost of tri- 
angles and it has a higher impact in high degree vertices 
then we expect that (c) d would decay faster than d^ 1 . 



4- Coupled duplication-divergence model 

In some practical cases the processes of duplication 
and divergence cannot be decoupled. For instance, the 
protein-protein interaction network has a functional role 
in the organism and, therefore, the lost of certain interac- 
tions can result in the death of the cor resp onding organ- 
ism. According to the classical model [83 after duplica- 
tion the duplicate genes have fully overlapping functions. 
Later on, one of the copies may either become nonfunc- 
tional due to degenerative mutations or it can acquire a 



novel beneficial function and become preserved bynat- 
ural selection. In a more recent framework |8(| |83 it 
is proposed that both duplicate genes are subject to de- 
generative mutations loosing some functions but jointly 
retaining the full set of functions present in the ances- 
tral gene. To investigate the influence of the coupling 
between duplication and divergence we consider the fol- 
lowing model introduced in Ref . 50] : At each time step 
a vertex is added according to the following rules 

• Duplication: a vertex i is selected at random. A 
new vertex ii with an edge to all the neighbors of i 
is created. With probability q v an edge between i 
and if is established (self-interacting proteins). 

• Divergence: for each of the vertices j connected to 
i and if we choose randomly one of the two edges 

or (it, j) and remove it with probability 1 — q e . 

A schematic representation of this rules is shown in Fig. 
1111 A similar model with an asymmetric divergence has 
been introduced in Ref. For practical purposes 

the algorithm starts with two connected vertices and we 
repeat the duplication-divergence rules TV times. Since 
genome evolution analysis [2i||88| supports the idea that 
the divergence of duplicate genes takes place shortly af- 
ter the duplication, we can assume that the divergence 
process always occurs before any new duplication takes 
place; i.e., there is a time scale separation between dupli- 
cation and mutation rates. This allows us to consider 
the number of vertices in the network, TV, as a mea- 
sure of time (in arbitrary units). It is worth remarking 
that the algorithm does not include the creation of new 
edges, i.e. the developing of new interactions between 
gene products, other than those due to self-interactions. 
However, we have tested that the introduction in the cou- 
pled duplication-divergence algorithm of a probability to 
develop new random connections does not change the 
network topology substantially. 

In order to provide a general analytical understanding 
of the model, we use a mean-field approach for the mo- 
ments distribution behavior. Let (d) (TV) be the average 
degree of the network with TV vertices. After a duplica- 
tion event TV — > TV + 1 we have that the average degree 
is given by 



(d) (N+l) = 



TV (d) (TV) + 2q v + (2q e - 1) (d) (TV) 
TV + 1 



(55) 



On average, the gain will be proportional to 2q v because 
of the interaction between duplicates, and to 2(d) (TV) 
because of duplication, and a loss proportional to 2(1 — 
q e ) (d) (TV) due to the divergence process. For large TV, 
taking the continuum limit, we obtain a differential equa- 
tion for (d). For q e < 1/2, (d) grows with TV but saturates 
to the stationary value (d) = 2q v /(l - 2q e ) + 0(N 2q '~ 1 ), 
On the contrary, for q e > 1/2, (d) grows with TV as 
TV 29e ~i. At g e = = 1/2 there is a dramatic change 
of behavior in the large scale degree properties. Analo- 
gous equations can be written for higher order moments 
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FIG. 12: The exponent <Ji{q e ) as a function of q e for different 
values of I. The symbols were obtained from numerical simu- 
lations of the model. The moments (d 1 } were computed as a 
function of A^ in networks with size ranging from N = 10 3 to 
N — 10 6 . The exponents ai(q) are obtained from the power 
law fit of the plot (d 1 ') vs. N. In the inset we show the cor- 
responding mean-field behavior, as obtained from Eq. I60L 
which is in qualitative agreement with the numerical results. 
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FIG. 13: Clustering coefficient as a function of vertex de- 
gree of the coupled duplication-divergence model for different 
values of q e , graph size TV = 10 6 and average over 100 real- 
izations. The solid line is a power law decay with exponent 
1. 



(d ). Using a rate equations approach similar to that 
considered in Ref. jg^ it is obtained that 

—— = A d - 1 n d - x - A d n d - — + 2q v G d ^ 1 + 2(1 - q v )G d , 
oN Jy 



where 



A(d m ) = - (q v + q e d) , 



(56) 
(57) 



FIG. 14: Average degree among the neighbors of a vertex 
with degree d of the coupled duplication-divergence model for 
different values of q e , graph size = 10 6 and average over 
100 realizations. The solid line is a power law decay with 
exponent 0.1. 



d'>d v 7 



2 



d'-d 



(58) 



The first two terms in the right hand side of Eq. H5tj|) 
result from the duplication of a neighbor of a vertex 
(with probability q e d/N) and the duplication of a ver- 
tex with the creation of an edge between the duplicates 
(with probability q v /N), yielding the attachment rate in 
Eq. H57|) . Moreover, the last three terms are given by 
the divergence of the duplicates, where with probability 
n d /N a vertex with degree d is replaced by two duplicates 
(factor two in the last two terms). Thus, the coupling of 
the duplication and divergence mixes the equations for 
different n d . We cannot give an exact derivation of n d 
but we can compute the moments of the degree distribu- 
tion HHm. Multiplying Eq. © by d l and summing 
over d we obtain 



where 



Mi - E 



Vl{le) = he + 2 



(59) 



l + qi 



- 1 



(60) 



provided ai(q e ) > 0. If ui(q e ) < the corresponding 
moment approaches a stationary value for large N. For 
all I we find a value qi at which the moments cross from 
a divergent behavior to a finite value for N —> oo. In 
particular for I = 1 we have q\ = 1/2 (as obtained above) 
and for / = 2 we obtain q% = 2\/3 — 3 w 0.46. Moreover, 
the nonlinear behavior with I is indicative of a multi- 
fractal degree distribution. 
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Mechanism (c} d ~ dT' (d nn ) d ~ ef 

Connecting neighbors < /3 < 1 a>0 
Random walk /3 = 1 a < 

Duplication-divergence /3 > 1 a < 



TABLE I: Summary of the correlation properties of the dif- 
ferent models analyzed here. 

VI. DISCUSSION AND CONCLUSIONS 

In order to support the analytical calculations, we 
have performed numerical simulations of the coupled 
duplication-divergence model with graph size ranging 
from N — 10 3 to 10 6 . In Fig. El we report the gen- 
eralized exponents <Ji{q e ) as a function of the divergence 
parameter q e . As predicted by the analytical calcula- 
tions, ai = at a critical value qi. The general phase 
diagrams obtained is in good qualitative, but not quan- 
titative, agreement with the mean-field predictions and 
the multi-fractal picture. Noticeably, multi-fractal fea- 
tures are present also in a recently introduced model of 
growing networks |49| where, in analogy with the dupli- 
cation process, newly added vertices inherit the network 
degree properties from parent vertices. Multi-fractality, 
thus, appears related to local inheritance mechanisms. 
Multi-fractal distributions have a rich scaling structure 
where the scale-free behavior is characterized by a con- 
tinuum of exponents. This behavior is, however, opposite 
to usual exponentially bounded distributions. Even if 
the evolution rules of the coupled duplication-divergence 
model are local they introduce an effective linear prefer- 
ential attachment. However, because the edge deletion 
of duplicate vertices introduce additional heterogeneity 
in the problem we obtain a multi-fractal behavior. 

The coupling between duplication and divergence is 
however less relevant to determine the scaling of the av- 
erage clustering coefficient with vertex degree. In fact, 
for the coupled duplication-divergence model Eq. (|53|> 
also applies, obtaining the inverse proportionality in Eq. 
(|54p. In Fig. E| we plot (c) d vs. d for different values of 
q e , manifesting a power law decay but with an exponent 
larger than 1. With decreasing q e (increasing the lost of 
edges) the power law decay deviates more and more from 
the predicted behavior (c) d ~ d^ 1 . This picture corrob- 
orates our hypothesis that if the edge lost is sufficiently 
large then a faster decay should be observed. 

On the other hand, the average neighbor degree as a 
function of the vertex degree for different values of q e 
is despited in Fig. El Negative degree correlations are 
manifested by a power law decay (d nn ) ~ d . The 
existence of negative deg ree correlations have been actu- 
ally reported in Ref. |90| for a protein-protein interaction 
network. Moreover, a model based on these correlations 
have been also proposed in Ref. [9l|. 

After analyzing these models we can conclude that 



growing networks based on local evolution rules exhibit 
an effective linear preferential attachment. The general 
principle behind it is the following. It is true that when 
we take a vertex at random the selection does not im- 
ply any degree preference, other than the one imposed 
by the degree distribution. However, if we take a neigh- 
bor of that vertex then some preference is induced. In 
fact the probability that vertex i is a neighbor of the 
randomly selected vertex is simply 



which is exactly the linear preferential attachment con- 
sidered in the BA model [l!j. Therefore, the connection 
to a neighbor of a vertex selected at random leads to an 
effective linear preferential attachment. 

Another important consequence of the local models 
considered above is the inverse proportionality between 
the average clustering coefficient and the vertex degree, 
or more general (c) d ~ d~^ . This result is determined by 
the fact that when a new edge is created to a vertex then 
with a certain probability an edge will also be created 
to one or more of its neighbors. Thus, locality is again 
a crucial point. On the other hand, even if we were not 
able to find an analytical explanation, these local mod- 
els are also characterized by degree correlations among 
connected vertices. 

These features are observed in the three models ana- 
lyzed here and are summarized in Tab. [fl They describe 
different systems such as technological, social and biolog- 
ical networks, that appear unrelated from the definition 
of their evolution rules. The detailed analysis performed 
here reveals that their main ingredient, they are local 
models of growing networks, explains the existence of 
strong similarities in their topological properties. These 
observations can be extended to other local models pro- 
posed in the literature. An example is the model intro- 
duced in Ref. , where each time a vertex is added it is 
connected to both ends of an edge selected at random. It 
can be easily shown that this rule also introduces an effec- 
tive linear preferential attachment, clustering hierarchy 
and degree correlations. Another example is the deacti- 
vation model |60j, where new vertices are connected to 
small sub-set of connected vertices. A detailed study of 
its topology [6j| reveals the existence of clustering hier- 
archy and degree correlations. 

In conclusion, the growing models with local rules ex- 
hibit some of the common features of real graphs. They 
are characterized by an effective preferential attachment, 
an average clustering coefficient that decreases with in- 
creasing vertex degree, and degree correlations. The lo- 
cal knowledge is then a general principle determining the 
topology of growing complex networks. 

I thank A. Vespignani, Y. Moreno and A.-L. Barabasi 
for helpful comments and discussion. 
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